Improving the Annotation of Sentence Specificity

نویسندگان

  • Junyi Jessy Li
  • Bridget O'Daniel
  • Yi Wu
  • Wenli Zhao
  • Ani Nenkova
چکیده

We introduce improved guidelines for annotation of sentence specificity, addressing the issues encountered in prior work. Our annotation provides judgements of sentences in context. Rather than binary judgements, we introduce a specificity scale which accommodates nuanced judgements. Our augmented annotation procedure also allows us to define where in the discourse context the lack of specificity can be resolved. In addition, the cause of the underspecification is annotated in the form of free text questions. We present results from a pilot annotation with this new scheme and demonstrate good inter-annotator agreement. We found that the lack of specificity distributes evenly among immediate prior context, long distance prior context and no prior context. We find that missing details that are not resolved in the the prior context are more likely to trigger questions about the reason behind events, “why” and “how”. Our data is accessible at http://www.cis.upenn.edu/%7Enlp/corpora/lrec16spec.html

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies

A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...

متن کامل

Tags Re-ranking Using Multi-level Features in Automatic Image Annotation

Automatic image annotation is a process in which computer systems automatically assign the textual tags related with visual content to a query image. In most cases, inappropriate tags generated by the users as well as the images without any tags among the challenges available in this field have a negative effect on the query's result. In this paper, a new method is presented for automatic image...

متن کامل

Multi-Document Summarization with Subjectivity Analysis

In this paper, we present our team TUT/NII results at DUC 2005 and additional experiments on improving multi-document summarization. Summarization systems have typically focused on the factual aspect of information needs. Subjectivity analysis is another essential aspect for better understanding of information needs. Our approach is based on sentence extraction, weighted by sentence type annota...

متن کامل

Modeling Concept-Attribute Structure

We apply hierarchical Latent Dirichlet Allocation (hLDA) to the problem of ontology annotation; automatically extending WORDNET with new concepts and annotating existing concepts with generic property fields, or attributes. The resulting annotations are evaluated along two dimensions: (1) the precision of the ranked lists of attributes at each concept, and (2) the specificity of the attribute a...

متن کامل

Selective Annotation of Sentence Parts: Identification of Relevant Sub-sentential Units

Many NLP tasks involve sentence-level annotation yet the relevant information is not encoded at sentence level but at some relevant parts of the sentence. Such tasks include but are not limited to: sentiment expression annotation, product feature annotation, and template annotation for Q&A systems. However, annotation of the full corpus sentence by sentence is resource intensive. In this paper,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016